Leveraging Clustering Techniques to Facilitate Metagenomic Analysis
نویسندگان
چکیده
Machine learning clustering algorithms provide excellent methods for conducting metagenomic analysis with efficiency. This study uses two machine learning algorithms, the selforganizing map and the K-means algorithms, to cluster data from an environmental sample collected from a hot springs habitat and to provide a visual analysis of that data. A data processing pipeline is described that uses the clustering algorithms to identify which reference genomes should be included for further analysis in determining possible organisms that are present in a metagenomic sample. The clustering revealed probable candidates for additional analysis, including a thermophilic, anaerobic bacterium, which is likely to be found in a hot springs environment and serves to validate the functionality of these tools. The machine learning techniques discussed here can serve as a launching point for elucidating protein sequences that could serve as possible reference comparisons to a specific metagenomic sample and lead to further study.
منابع مشابه
Joint Analysis of Multiple Metagenomic Samples
The availability of metagenomic sequencing data, generated by sequencing DNA pooled from multiple microbes living jointly, has increased sharply in the last few years with developments in sequencing technology. Characterizing the contents of metagenomic samples is a challenging task, which has been extensively attempted by both supervised and unsupervised techniques, each with its own limitatio...
متن کاملغربالگری میکروارگانیسم های جدید و ژن های مفید آنها: از روش های سنتی تا متاژنومیکس
Metagenomics is a discipline that enables the genomic study of unculturaled microorganisms. Microorganisms constitute two third of the Earth’s biological diversity. In many environments, 99% of the microorganisms cannot be cultured by standard techniques. Culture-independent methods are required to study the genetic diversity, population structure and ecological roles of the majority of o...
متن کاملLarge Scale Metagenomic Sequence Clustering via Sketching and Maximal Quasi-clique Enumeration on Map-Reduce Clusters
Taxonomic clustering of species from millions of DNA fragments sequenced from their genomes is an important and frequently arising problem in metagenomics. High-throughput next generation sequencing is enabling the creation of large metagenomic samples, while at the same time making the clustering problem harder due to the short sequence length supported and sampling of hitherto unknown species...
متن کاملEvidence-Based Clustering of Reads and Taxonomic Analysis of Metagenomic Data
The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. In this paper we focus on clustering methods and their application to taxonomic analysis of metagenomic data. Clustering analysis for metagenomics amounts to group similar partial sequences, such as raw sequence reads, into clust...
متن کاملQuasiAlign: Position Sensitive P-Mer Frequency Clustering with Applications to Genomic Classification and Differentiation
Recent advances in Metagenomics and the Human Microbiome provide a complex landscape for dealing with a multitude of genomes all at once. One of the many challenges in this field is classification of the genomes present in a sample. Effective metagenomic classification and diversity analysis require complex representations of taxa. With this package we develop a suite of tools, based on novel q...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Intelligent Automation & Soft Computing
دوره 22 شماره
صفحات -
تاریخ انتشار 2016